Transporter Performance
General Best Practices
Consider these best practices while using the Tidal Automation Transporter:
-
Use server-side filter to read specific jobs.
-
Run only one instance of transporter at a time in a machine.
-
Have small number of top level groups.
-
Transport during off peak hours or when Client Manager usage is significantly less.
Transporter Job Read Options
Configurations have been made available to provide improved performance for unfiltered job reads. Multiple options are available for flexibility. Configuring these options may require tuning based on the customer environment. For tuning purpose, it would best to run the Transporter in debug mode with an open console so that you can view how the reads are performing.
To run the Transporter in debug mode, include XPORTER_DEBUG=YES in the Transporter.props file and run the transporter.cmd script located in bin.
The REST call job.getList has been replaced with these options.
Parameters Configured via Transporter.props
Only one of these parameters should be set to true at a time:
-
READJOBS_PAGINATED
-
READJOBS_ALL
-
READJOBS_BATCHES
The READ_BATCHES parameter applies to READJOBS_PAGINATED or READJOBS_BATCHES.
If none of the parameters is set, the default configuration for read is (READ_BATCHES=500, READJOBS_BATCHES=true).
-
The READ_BATCHES parameter is used when reading paginated or batched reads.
-
The READJOBS_PAGINATED parameter determines whether to read jobs in pages.
-
The READJOBS_BATCHES parameter determines whether to read jobs in batches.
-
The READJOBS_ALL parameter determines whether to read all, given the min and max job ID.
READJOBS_PAGINATED
READJOBS_PAGINATED configures the Client Manager to return job data in pages, with the batches based on the READ_BATCHES value.
Example: READ_BATCHES=1000 and READJOBS_PAGINATED=true, tells the Client Manager to return job data in batches of 1000. This approach reduces the overhead on the Client Manager as data is sent in smaller batches. Increasing the READ_BATCHES value will reduce the number of requests sent to the Client Manager since the jobs are returned in larger batches.
Note: The approach may have less benefit given many jobs (i.e. 50K or more). The batching is done at the Client Manager level.
READJOBS_BATCHES
READJOBS_BATCHES reads jobs based on a given range of job IDs, where the range is specified via READ_BATCHES.
Example: If you have 50,000 job records whose job IDs start at 1 and ends at 50000, and you have set READ_BATCHES=1000 and READJOBS_BATCHES=true, requests will be sent to the Client Manager to query job records in ranges, until no more records are returned, as follows.
jobid >=1 and jobid <=1001
jobid >=1002 and jobid <= 2002
jobid >=2003 and jobid <= 3003
…
If all the job IDs are sequential and start at 1, then each batch request will result in roughly 1000 records. However, if there are large gaps in the job IDs, due to mass job deletes for example, the request may return fewer results depending on where the job record ID falls in that range. While executing the read and running Transporter in the debug mode, if you find that very few or 0 records are returned given a READ_BATCHES configuration, then increasing this value will be necessary to reduce the number of requests that return 0 or few results.
Note: The approach appears to be more beneficial when there are many job records (50K or more).
READJOBS_ALL
READJOBS_ALL reads all jobs based on the first and last job ID. The result is that all jobs will be read in a single request. This approach is different from the job.getList call in that while they return all jobs, this request adds a query condition to the request, which seems to produce better performance. However, because all records are returned in a single request, the Client Manager will need to process all the records to send to Transporter.
Note: If there are many job records, the overhead on the Client Manager may be too high.